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ABSTRACT 

A conceptual model for parallel computations on large arrays is 
developed in this paper. The mode] provides a set of language concepts 
appropriate for processing arrays which are generally too large to fit 
in the primary memories of a multiprocessor system. Ttie semantic model 
is used to represent arrays on a concurrent architecture in such a way 
tliat the performance realities inherent in the distributed storage and 
processing can be adequately represented. An implementation of the 
large array concept as an Ada package is also described. 
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1. Introduction 


One oi the major driving forces behind proposals for large-scale 
parallel multiprocessor architectures, such as CEDAR [4], MPP [3], 
TRAC fl6], and Blue CHIP (17], has been the need for more processing 
power for large scientific and engineering applications. A major issue 
in the effective use of such systems is the design of the input/output 
aspects of the system, that is, the methods by which large quantities 
of data may be effectively moved from secondary storage into the 
system, routed to the appropriate processing elements for processing, 
and then back to secondary storage. For example, in the NASA Massively 
Parallel Processor CMPPJ, designed for processing LANDSAT-D satellite 
images, the processing array has over 16000 processing elements (128 x 
128 array of PE’s), but each image consists of a 6000 x 6000 array of 
pixels. To process such an image, more than 2000 blocks of data must 
be moved from secondary storage through the PE's and back to secondary 
storage. 

In scientific and engineering problems, the large data objects of 
interest are usually arrays. Typically these arrays are too large to 
fit in the central memory of even a large sequential computer. In a 
multiprocessor, such an array may occasionally be distributed only 
across the primary memories of the PE’s, but generally it must be 
partitioned between secondary storage and the primary memories. 

Appropriate hardware and operating system software for managing 
the distributed storage and processing of laige data objects in a 
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parallel system are only part of the problem. Stevenson and 
Perrott fl3] in a survey of the problems encountered in the use of 
ILLIAC-IV report that one of the major programming problems was the 
fact that the users had to resort to the use of relatively lowlevel 
assembly language programs to handle the "backing store traffic” 
required to move data in and out of the machine. If applications 
programmers are to make effective use of a parallel system, a high- 
level view of data storage and movement is needed at the applications 
level. 

Unfortunately, there are major performance realities associated 
with data movement within a parallel system, both for data moving into 
and out of the system €uid for data moving between processing elements. 
A high-level abstraction should not hide these performance realities 
from the programmer, a point strongly emphasized by Jones and 
Schwarz [6] in their study of experience with multiprocessor 
programming. Thus what is required is a high-level abstraction that 
can reflect accurately the performance realities of a parallel system, 
so that the applications programmer can make effective use of the 
system without resorting to low-level primitives. 

In this research, attention is restricted to arrays as the large 
data objects of interest, because arrays form the major large 
structures in many scientific and engineering applications. In this 
context there are two coupled problems for which we seek an effective 
language treatment; 
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(1) The partitioning of an array and its distributed storage on both 
secondary storage and processor memories in a multiprocessor 
architecture. To process a large data structure such as an array 
requires a complex series of data partitionings and data movements 
through the distributed system. Ue seek a solution that frees the 
user from the task of managing directly these partitionings and 
movements, without masking the performance realities involved. 

(2) The partitioning of an array cuid its concurrent processing by the 
separate processors of the multiprocessor architecture. To 
effectively utilize a multiprocessor architecture, the total 
computational task must be divided into subtasks capable of 
executing in parallel. The subtasks need to access the data 
structures involved simultaneously so as to be able to run 
concurrently. The language should provide the appropriate higher 
level primitives to express the appropriate division and sharing 
of the data among these subtasks. These primitives should allow 
large"scale parallel computation on the data structure without a 
major overhead in subtask communication for the purpose of 
synchronizing access to the data structure. Subtasks should be 
able to traverse the data structure without unnecessary mutual 
exclusion from concurrent access. 

The goal of this research is to develop a conceptual semantic 
model for parallel computations on large arrays that addresses these 
two issues and that can be effectively implemented on a variety of 
multiprocessor architectures. The next section provides some 
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background for the study. The following section presents the model, 
with a rationale for the major design decisions. A general 
implementation strategy, presented as an Ada* package, together with an 
example of use of the model, follows. The characteristics of a 
particular architecture obviously give rise to different performance 
characteristics, and in some cases, necessary restrictions on the 
model. Implementation strategies for various typical multiprocessor 
architectures appear in [10]. Architectures based on shared memory and 
architectures with no shared memory between processors suggest somewhat 
different solutions to the design problem. The semcuitic model 
presented here allows an effective solution on both types of 
architecture. 

2. Background 

Languages for programming multiprocessor architectures have 
largely ignored the problems associated with concurrent processing of 
large data structures. For example, languages proposed for SIHD 
machines, e.g., PASCALPL [19], IVTRAN [11], and ACTUS [12], extend 
sequential languages such as PASCAL or FORTRAN to take advantage of the 
element-level parallelism exhibited by SIMD architectures. Thus a 
statement of the form: 

A := B + C 

where A, B, and C are arrays of the same size, implies that for all I 
* Ada is a registered trademark of the Department of Defense 
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and J the following statements 

AU.J] s= BlI.Jl + C(I,J] 

is to be executed in parallel. For array sizes smaller than the size 
of the hardware matrix of processing elements, the above statement can 
often be executed in a single machine cycle. But if the sizes of B and 
C are larger than the hardware matrix, then they have to be partitioned 
into blocks before the computation can be carried out. Typically the 
languages provide only rudimentary constructs for making this partition 
and managing the movement of data between secondary storage and 
processing element memories. 

For MIMD architectures, the programming problem is usually more 
difficult. The large data structure not only has to be distributed 
among processor memories and secondary storage, but the separate 
asynchronous tasks involved may require different size partitions of 
the data. Languages designed for HIMD architectures, such as CSP (5], 
ARGUS [81, and Ada [21, usually provide no special features for 
processing large data objects other than files. Data objects, tdiether 
large or small, are considered to be owned by a single task and the 
access to the data structure is controlled by the task. To effectively 
utilize a multiprocessor architecture while processing a large data 
structure in parallel, independent tasks need to have the capability of 
accessing independent parts of the data structure concurrently. Of 
course, two tasks have to be prevented from simultaneously updating the 
same portion of the data structure. Unguages designed for WlttD 
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processors do not provide the higher level view of distributed control 
of a single data structure while providing mutual exclusion between the 
tasks where appropriate. 

Language concepts for file processing, in fact, are similar in 
important ways to those required for large array processing, as 
discussed in more detail in the next section. The concept of a 
•'moveable window”, in particular, is useful for both file and large 
array processing, where the window represents a part of the data object 
currently available for processing. The utility of “window" concepts 
for partitioning and processing arrays has occasionally been recognized 
in sequential languages, most notably in the language OL/2 [14]. In 
OL/2 a powerful and flexible mechanism for defining and using windows 
on arrays is developed, which allows an array to be partitioned for 
processing in very natural ways. The mechanism allows windows to be 
hierarchically decomposed into smaller windows to as many levels as 
desired. We develop a similar but simpler set of window concepts for 
the case of distributed storage and processing, a case not considered 
in the OL/2 design. 

2. A Model isiL Array Pr<?<;??gin.q 

In defining a model for concurrent processing of large arrays, the 
roots of our approach are found in traditional language constructs for 
file processing rather than array processing. In sequential languages 
linear arrays and files are distinguished, although structurally each 
is a linear sequence of elements of some type. Files, however, have 
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several characteristics that distinguish them from arrays. For this 

discussion, the most important are: 

(IJ A file is a distributed data structure, stored primarily on 
secondary storage, rather than in central memory. Only a part is 
presurae4 to be available in central memory for processing at any 
time. This latter fact is reflected by using a window (buffer 
variable in PASCAL) on the file which makes only a part of the 
file (usually a single element) visible to the program at a time. 

(2) Processing a file consists of alternating steps of (a) positioning 
the window on the file and (b) processing the element or elements 
visible within the window. Movement of the window to a new 
position is conceptually a separate step from the processing 
itself . 

(3) A file has a lifetime (potentially) longer than that of the 
programs processing it, and hence the structure of the file is 
defined independently of the progreuns that access it. 

(4) Implementation of file processing usually involves a limited form 
of concurrency. Typically two processes cooperate, one executing 
the user program and the other managing the buffering of blocks of 
data between secondary storage and buffers in central memory. 
From the buffer a local copy of the data visible in the processing 
window is provided to the program. As the window is moved, the 
buffer manager process determines when transfer of an entire block 
to secondary storage is necessary. The user is effectively 
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protected from managing these transfers himself, but the language 
concepts of "windoM” and "moving the window" reflect more 
abstractly the performance realities inherent in the 
implementation structure. 

The model developed here for large array processing utilizes 
similar concepts. In this section the major features of the model are 
presented in a machine and language independent form. We state each of 
the major semantic aspects of the model, together with the rationale 
for its inclusion. In the next section, the model is presented more 
fully as an Ada "large array" abstract data type (package), which then 
provides both a syntax and an implementation model in terms of Ada 
semantics. The major features of the model are the following: 

1. Large array organization . A "large array" is seen by the 
programmer as a single data object with the same logical organization 
as an ordinary array. That is, a one-dimensional large array is a 
linear sequence of homogeneous elements, a two-dimensional large array 
is a grid of rows and columns, and so forth. 

Rationale . The programming burden is greatly simplified if the basic 

formulations of algorithms in terms of matrix algebra and other array 
« 

processing structures can be retained without a major reorganization. 

2. Lifetimes of large arrays . The lifetime of a "large array" 
ordinarily is different from that of a program which processes it. 
Thus its structure is defined independently of a program, and a program 
gains access to it through an OPEN operation similar to that used for 
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access Co a file. A CLOSE operation terminates access to the array 
without destroying it. 

Rationale . In applications, large arrays ordinarily represent data such 
as images or structural models that are constructed by other software 
systems or by other phases of a large program complex. They are often 
saved between processing phases and may be processed repeatedly by 
several different programs. 

3. Windows . A “large array” is not visible to a single task as a 
unit at one time. Instead, a task needing to access it defines a 
"window” on it, where a window is a subarray of the whole. The 
elements visible in the window are available to the task for 
processing, treating the window elements as the elements of an ordinary 
(small! array. 

Rationale . Restricting visibility to a window allows distributed 
storage of the array, with only the window elements present in the 
primary memory of the processor running the task that owns the window. 
The remainder of the array may reside on secondary storage, in other 
processor memories, or in other parts of a memory hierarchy. 

1. Fixed - size windows . The size of a window is fixed when it is 
created and remains invariant throughout its lifetime. 

Rationale . Dynamically changing window sizes (e.g., as found in OL/2 
fl41) require substantial run-time overhead for referencing and storage 
management. Fixed size windows which can overlap array boundaries (see 
* 10 below! are simpler to implement, although somewhat less flexible. 
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5. Moyenient fil windows . A window may be positioned at any 
arbitrary point on a large array (absolute movement) or moved from its 
current position to a new position defined in terms of its current 
position (relative movement), e.g., by moving a specified distance in a 
specified direction relative to the current position. Thus if the 
entire array is to be processed, a window is positioned at an initial 
position and then processing alternates with relative movement until 
the entire array has been traversed. 

Rationale . "Window movement" represents abstractly the primary cost 
associated with large array processing, because moving a window 
ordinarily involves some transfer of data in and out of primary memoi'y. 
The abstraction allows use of various blocking and buffering mechanisms 
in the implementation, as described in the following section. In 
addition, regular traversal patterns can be tied to iteration 
structures in the underlying language (e.g., the "iterators" of CUI 
(7)) with the implementation providing prefetching of the next block of 
data into a staging area during a previous iteration, thus overlapping 
processing in a window with the input-output required lor a later 
iteration. 

g. Access rights windows . A window may be restricted to be 
"read-only", "write-only", or may be "read-write". A read-only window 
allows referencing but not assignment of values visible in the window. 
A write-only window allows assignment of new values only. 

Rationale . Read-only windows provide different performance 
characteristics from read-wiite windows, because read-only windows 
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require fewer data transfers and less mutual exclusion between tasks. 
In general, to maintain consistency of the data in a large array, only 
a single read-write window may be positioned over any given element, 
but multiple read-only windows over an element are acceptable. Write- 
only windows need not be initialized from secondary storage and require 
fewer data transfers when moved. Write-only windows are natural when 
extending an array or filling a newly created array. 

7. Windows provide a local copy of the data . A window provides a 
local copy of the data visible in the large array. Updating of the 
actual large array values occurs only when the window is moved to a new 
position, or through an explicit WRITE operation, or when the window is 
CLOSE’d. An explicit READ operation allows updating the local copy of 
the large array values without window movement (e.g. when one task may 
be writing in an area that another is reading). Similarly the WRITE 
operation updates the large array with values from the local copy of 
the window without moving the window. 

RaiJLsnalfi.- For performance reasons again, we do not want updating of a 
window element via assignment to imply input-output to secondary 
storage. Instead only window movement (or explicit WRITE or CLOSE of a 
window) should cause input-output. This design allows efficient use of 
fairly large windows that minimize secondary storage traffic. 

2. Multiple wlndQws 20 different arrays . A task may have windows 
on several different arrays simultaneously and these windows may be 
moved asynchronously. A window may be opened or closed on an array at 
any time during execution. 



12 


Rationale . Many algorithms require simultaneous access to several 
arrays, e.g., to scan one while creating another. 

Multiple windows on the same array . Several tasks may have 
windows concurrently on the same array. To preserve the basic 
integrity of the data as these windows move, a read-write or write-only 
window (an update window) cannot overlap other update windows. A 
read-only window is allowed to overlap with other types of windows. A 
task wishing to move an update window to a position overlapping the 
position of another update window must wait until the latter window is 
either moved or closed. The checking eind waiting involved is part of 
the HOVE operation semantics and requires no special programmer action. 
Rationale . In working with large numbers of tasks moving windows 
asynchronously on the same array, it is relatively simple for the 
language run-time support programs to provide mutual exclusion cind 
waiting when windows collide (as detailed in the next section). To 
require the programmer to do the checking explicitly would make 
programming much more difficult. Of course the mutual exclusion 
provided by the system raises the issues of deadlock and starvation, 
which must be addressed when implementation is considered. 

IQ. Uindows and array boundaries . A window may extend beyond the 
boundary of the array. At the time of creation of a read-only or 
read-write window, a default value for elements beyond the array 
boundary may be specified. The default value is returned for any 
reference to such an element. Assigning a value to an element outside 
the array boundary changes only the local copy of the window elements. 
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The array itself is not extended by such an assignment. When the 
window is moved, values outside the array boundary are lost. The single 
exception to this rule is the case of a linear array with a write-only 
window positioned just past the last element (greatest subscript). 
Assignments within such a window extend the array and are retained when 
the window is moved. An *end-of-structure" function allows a program 
to test whether a window extends over an array boundary. 

Bationalg. "Edge effects" when moving a window within a large array 
are a major source of programming problems. For example, when 
processing a symmetric matrix, one often wants to store only the upper 
triangular portion and move a window down the rows, starting with the 
element on the main diagonal. The rows get shorter as the window 
approaches the lower-right corner of the matrix, and it is convenient 
if the same window can be used, simply extending past the matrix 
boundary as needed. Similarly, if processing an image (matrix of 
pixels) with a fixed size window moving in fixed size increments, it is 
convenient if the window can overlap the image boundary at the end of 
each processing sweep. 

Writing new values outside the array boundary has a simple 
semantics and implementation only in the case of extending a linear 
array. For a matrix, the semantics is complex if an extension of the 
array is intended, and the implementation is also troublesome. For 
simplicity, therefore, no dynamic extension of large arrays is allowed, 
except in the simple case. The "end-of -structure" function is the 
analogue of the usual "end-of-file“ function for files and has similar 
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uses. 


il. Siibwindows . A task may subdivide a window into smaller parts 
called subwindows . A subwindow is a subarray of a window. A subwindow 
may be passed as a parameter to a subtask (another task) for concurrent 
processing. However, a subwindow is semantically distinct from a 
window in three primary ways: (a) a subwindow cannot be moved, (b) 
assignment to a subwindow element implies immediate update of the 
corresponding window element, cind (c) overlapping subwindows directly 
access the same element of the window (there are no local copies of 
subwindow elements). Processing using subwindows proceeds in distinct 
phases: (a) The main task positions its window appropriately on a large 
array, (b) subwindows are passed to sub tasks for processing, and (c) 
when all subtasks have terminated (or paused) the main task moves its 
window to a new position and the cycle repeats. 

Rationale . Processing using windows implies some secondary storage 
traffic and system overhead for mutual exclusion, particularly where 
several tasks are processing the same large array simultaneously. 
Subwindows allow several tasks to operate on the same part of a large 
array without major system overhead. Because subwindows cannot be 
moved independently, no secondary storage traffic is involved. Because 
no local copies of array elements are maintained and assignment causes 
immediate update of an element in all subwindows within which it is 
visible, the programmer is responsible for mutual exclusion and 
consistency of the data, rather than the system. For many 
applications, this alternative is appropriate. For example, a window 
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may be positioned over several dozen rows of a large array. A subtask 
may be initiated to process each row, with a subwindow on the 
appropriate row. Since these subwindows can never overlap, consistency 
within the window is not an issue, and the system overhead for checking 
is not required. Also, secondary storage traffic is greatly reduced by 
moving the large window as a unit rather than each "row" subwindow 
individually. At the same time, system checking and mutual exclusion 
on movement of the larger window allows other tasks in the system to 
process the same large array for other purposes concurrently. Shared 
memory architectures allow subwindows to be implemented as simple 
descriptors (dope vectors) with essentially no overhead in their 
concurrent processing. In non-shared memory architectures, references 
and assignments to subwindows require communication to the task owning 
the window, which may substantially increase the cost of use. For this 
reason, subwindows are particularly attractive for use where tasks have 
shared memory available. 

£yatnples. Several short examples are presented here, with a more 
extended example in a following section. Since we have no syntax for 
programming these examples, only an intuitive sketch of the semantics 
is provided. 

Ex^niPl? 1 . ilndinarY Sgauential file processing . Sequential file 
processing is the simplest case of large array processing. A task 
positions a read-only window at the start of a large linear array (to 
read the array file). The window is the equivalent of the PASCAL 
“buffer variable". Processing the element(s) within the window 
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alternates with advancing the window until the entire array has been 
traversed. To write the array, a write-only window is positioned one 
element beyond the end, eind assignment to the window element alternates 
with forward movement until all data has been written. Note that the 
large array semantics allows the usual implementation structures for 
file processing to be adapted to this larger conceptual model for array 
processing. 

f x^qiPle 2. UNlX^ - like pipes . A pipe in UNIX-like systems is a file 
that is written by one task while a second simultaneously reads and 

processes the newly written elements. The second • task may lag 

arbitrarily far behind the first in its processing. The large ari'ay 
model allows this structure to be represented by the first task having 
a write-only window at the end of the array-file (as in Example 1 

above) and the second task having a read-write window positioned on 

some existing part of the array and advancing toward the first window. 
Note however that the array is not “consumed" by the second task 
reading it, so several tasks may have windows that allow them to' read 
and process the array concurrently. 

Ex^mPl^ 2- £ac. tQr .iha a banded symmetric matrix . In solving a large 
system of equations in matrix form: 

A X = B 

the matrix is often symmetric and "banded", meaning each row has only a 
^ UNIX is a trademark of Bell Laboratories. 
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maximum of some constant "beta” non-zero elements, starting with the 
element on the main diagonal. “Beta" is termed the "bandwidth” of the 
matrix. Only the elements within the non-zero band are stored, as an N 
X “beta" matrix if the original matrix was N x N. A factoring 
algoritluD typically might work down the band, instantiating “beta“'*’l 
tasks, each of which works on one row of the matrix at a time. The 
first task takes the first row, the "pivot row", and sends it to the 
other tasks. Each of these tasks uses the pivot row to perform 
calculations on the row it is currently assigned. Figure 1 illustrates 
this processing structure. 
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Only "beta"+l Casks are needed because only the rows with at least one 
non-zero element in the same column as a non-zero of the pivot row need 
to be processed in each step. 

After sending the pivot row to all other tasks, the first task 
moves its window down to the row "beta"*-l rows below its current row. 
The second task then has the new pivot row, and the process repeats. 
The tasks progressively move their windows down the band until the 
entire array is processed. 

4. Aq Implementation tlodel 

In the last section an overview of the language concepts needed 
for concurrent processing of large arrays has been provided. In this 
section an implementation model for these concepts is sketched briefly. 
An actual implementation as an Ada package is presented in the next 
section. 

The implementation model for large arrays parallels to some extent 
the implementation of file processing on sequential machines. As noted 
before, for sequential file processing, a buffering process manages the 
transfer of blocks of file components between the secondary and primary 
storage, copying the components of the file into and from the buffer 
variable associated with the file as needed. The concept of windows on 
a large array, as seen in the last section, already embodies the idea 
of a private copy of the elements of the large array visible through 


the windows. 
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The large array itself is divided into subarrays or blocks for 
implementation purposes as shown in Figure 2. The 1/0 of the large 
array is performed in terms of the blocks, i.e., the large array is 
stored on the secondary storage as blocks and the blocks are 
transferred in and out of primary storage as needed. 


A window, when stationed on the large array, may fall completely 
within a block or may overlap several blocks (see Figure 2). To 
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provide a copy of the elements, the appropriate portions of the blocks 
covered by the window are transferred sequentially to the window. Uhen 
the window is moved, the old values are written back. This implies 
that each of the blocks overlapped by the window in the old position 
must be updated. 

The division of large arrays into blocks also supports 
concurrency. As long as two windows are stationed on the large array 
so that they cover totally separate blocks, all processing required lor 
the movements of the windows can be performed in parallel. Such a 
situation is depicted by the two windows on the left in Figure 2. On 
the other hand, if the windows are stationed so that they overlap 
common blocks (the two windows on the right in Figure 2), then access 
to the common blocks has to be serialized so as to preserve the 
integrity of the data. The tasks owning the windows can still process 
the window copies of the elements in parallel, with the serialization 
occurring only during the reading and writing process between the 
window and the actual elements of the large array. 

A block of the large array is required to be resident in the 
primary memory only if there is a window covering it. Thus a simple 
demand paging policy can be implemented wherein a block is paged in 
from secondary storage only when a window moves onto it. The block can 
be replaced when the window moves on and no other window has covered it 


in the meantime. 
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The model presented above attempts to find a solution to the two 
related problems in the concurrent processing of large arrays as 
discussed at the beginning of the section. Windows provide a means of 
specifying the elements of a large array needed by the task for further 
processing without the programmer having to code the details of data 
movement between the primary and secondary storage. Tasks can use 
independent windows stationed on the same large array to process the 
array in parallel. The windows mutually exclude each other from the 
same portion of the array (when required). This provides for the 
maximum possible concurrent activity with no window unnecessarily 
inhibiting the processing in another window when such mutual exclusion 
is not needed. The model allows the language structure to reflect the 
performance realities of distributed storage and processing without 
unduly burdening the user with implementation details. 

^ Ma implementation si Large Arrays 

The model for processing large arrays in parallel needs to be a 
part of a system which provides other primitives necessary for 
distributed processing such as creation of independent tasks, 
comniunication between tasks, etc. To provide a precise semantics (and 
syntax) for the concepts of the model, we can consider "large array" as 
a new abstract data type. Although Ada does not provide any higher 
level constructs for the distributed control of data structures, it 
does provide a generic package facility which may be used to define 
this new abstract data type. The Ada tasking facility provides a 
"virtual computer" that may be considered as an abstract distributed 



22 


machine. Within this machine, the large array package defines a 
detailed implementation of the model for processing of large arrays in 
parallel. The package consists of all the necessary type definitions 
for large arrays, windows and subwindows along with procedure and 
function definitions for the relevant operations on the above defined 
types. Such a package definition provides a clear and precise syntax 
(although not entirely ideal) for creating and operating on large 
arrays in algorithms designed for concurrent processing of arrays. 

Since the large array package is coded in Ada, a working knowledge 
of the language is assumed for the purposes of describing the package. 
The large array package has been implemented on a VAX 11/780 using the 
UNIX implementation of the NYU/Ada ED translator and interpreter. 

g.l. Ihs. Large Array Package 

In this section the basic structure of the package is described. 
Host of the scientific applications involving large arrays generally 
require only two-dimensional arrays. Hence the package, as presented 
here, supports only two-dimensional large arrays. The extension of the 
package to higher dimensioned arrays is straightforward. The public 
part of the package has been provided in Appendix A. For a detailed 
listing of the package refer to flOJ. 

^.1.1. Large Arrays 


The generic package LARGE _AMAY_PKGE can be instantiated with 
any predefined or user defined type being supplied as actual parameter 
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to match the formal parameter ELEMENT. The following statements 
declare t'LOAT_ARRAY as a new instantiation of the generic package to 
support arrays with elements of type FLOAT t 

package FLOAT ARRAY is new LARGE ARRAY PKGE(FLOAT); 
use FLOAT_ARRAY; 


Preexisting arrays in external files can be accessed by attaching 
one of these large arrays via the procedure OPEN_LARGE_ARRAY »diile 
new arrays can be created using the procedure CREATE _LARGE_ARRAY . 

External files are named by a chatacter string- The row and column 
bounds of a large array are specified at the time of associating the 
large array with an external file and remain fixed during the existence 
of the arrays. For excimple 


A, X : LARGE_ARRAY; 


OPEN_LARGE_ARRAY ( A, 


row_low_bd 

=> 1, 

rou^high_bd 

=> n. 

col_lou_bd 

1. 

col_high_bd 

=> n, 

name 

=> "A 




CREA TE_LARGE_ARRAY 


C X, 

row_lou_bd -> 1, • 
rou_tiigh_bd -> n, 
col_lou_bd => 1, 
col_high_bd => m, 
name ~> "X_Iile“J; 


* Ada allows parameters to a particular subprogram either as a list, 
which associates actual and formal parameters by position, or by 
preceding the actual parameter by ‘'formal_narae =>“ where formal_narae is 
the name of the associated formal parameter. Explicitly named parame- 
ters are used for clarity in many of these examples, but positional 
parameters are used where no confusion is likely. 
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A and X are declared as UiRGE_ARRA'is . The large array A is 
associated with an already existing external file A_Iile and is 
specified to be an n by n array while a file X_file is created to be 
associated with the n by m large array X. Procedure 

CLOSE_LARGE_ARRAY can be used to sever the association of an internal 
large array with its associated external file while procedure 
DELETE_LARGE_AHRAY deletes the associated external file; 

Internally a large array is viewed as a sequence of blocks, each 
block being defined as a square subarray of the large array as shown in 
Figure 2. The blocks are numbered sequentially starting from the top 
left corner and increasing along the columns and then rows. 

The row and column bounds are used to determine the total number 
of blocks needed for the large array. For each of the blocks 

constituting the large array a ”monitor""like task ■ BLOCK_CONTROLL£R 
- is initiated to control access to the block. The reading in and 
writing out of the blocks to secondary storage is performed by the 
associated task as and when required. Note that the BLOCK_CONTROLLER 
tasks are internal to the package and are not visible to the user. 

Wi n do w s 

Windows are attached to a particular large array using the 
procedure GREAT E_WINDOU . The same procedure is also used to specify 

the size of the window and its privileges, i.e., whether it is read- 
onlyCRJ, write-only(U) or read-write(RW) . The row and column 
increments to be used for relative movement of the window are also 
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passed as parameters to the procedure. The user can specify an 
edge.element which will be used to fill out the portion of the window 
which does not lie within the bounds of the associated large array when 
the window is moved past the edge of the array. The statements 


A_wlndow, X_*titidow : UINDOU; 

CHEATS.UINDOUC A_uindou, 

rotf_size -> n, col_size => n, 
inmode -> R, 

routine => 0, col_inc -> 0, 

ar -> A, 

edge => FALSE, edge_element => 0.0); 

CREATF._UINDOU( X_uindou, 

rou_size => n, col_size => 1, 

inmode => U, 

row_inc => 0, col_inc => 1, 

ar => X, 

edge -> TRUE, edge_element -> 1.0); 

declare two windows A_»indou and X_windou. The window A.uindow 
is associated with the large array A. It is a read-only window and 
its size is n by n. The move increments are set to zero, implying that 
it will not be moved via relative movements. The X_uindow is a n by 
1 write-only window associated with the large array X. The move 
increments are CO.l). i.e., each relative move will displace it one 
column to the right on the large array. An edge.elemenC of 1.0 is 
specified so that if the window is positioned such that it lies 
partially outside the large array, the elements of the window not 
overlapping the array will be set to 1.0. The exception 
NONEXISTENT _ARRAY_ERR0R is raised if the large array on which window 
is to be stationed has not yet been created or opened. 
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With each variable of type UINDOU , the following data structures 
are visible to the task declaring the variable: 

type UINDOU is access WINDOW_DESC; 

type UINDOU_DESC is 
record 

win : /lATJUX ACCESS; 

info : aiNDOU_INFO; 

end record; 

Thus a window is viewed by the user as a local array of specified size 
(the component win) along with information which is private to the 
package (the component info) . The processing of the elements of the 
large array visible through the window is hence performed in a manner 

analogous to the processing of an ordinary small array. That is to 

say, the accessing of window elements is done through subscripting the 
component win relative to the origin of the window rather than the 
origin of the large array. In the following statement, the value of 

the (2,3)th element of the A_window is assigned to the (4,l)th 

element of X_window: 

X_window.win(4,l) := A_window.win(2,3); 

The position of the windows on the large arrays determines tlie actual 
elements of the large arrays used in the above assignment statement. 
Thus if the X_window is positioned at the fifth column of the large 
array X then X(4,5) is actually assigned. 
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Window movement 

Two methods of window movement are provided; absolute and 
relative. The SET procedure moves the window to the indicated 
absolute position on the array. Thus SET can be used to establish 
the initial position of the window on the array. In the following 
statements, the A_uindou is set at position (1,13 on the large array 
A so as to cover the whole array while X_uindow is- set at position 
(1,53 so as to cover the 5th column of the large array Xt 

SET(A_window. neu_row -> 1, new_col -> 1); 

SET(X_window, neu_rou => 1, new_col => 5); 

The MOVE procedure, on the other hand, uses the row and column 
increments defined while creating the window, to move to a new position 
relative to the present position. The procedure MOVE just calls 
SET with the new absolute position. The statement 

MOVE CX); 

uses the move increments specified during the creation of X_windou to 
move the window one column to the right, i,e,, the 6th column of large 
array X. 

Whenever a window is moved, a copy of the elements visible through 
the window is provided in the local array win (component of 

UINDUU_DESC) , When the window is moved again, the values of the 
elements are written back to the external large array before the move 
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is executed. Thus each task processes its private copy of the elements 
visible through the window. The actual array is updated only when the 
window is moved away. The reading and writing of the window elements 
is obviously dependent on the privileges of the window, i.e., no 
writing is done for read-only windows while no reading is performed for 
write-only windows. 

Procedure READ can be used to copy the large array elements 
covered by the window into the local array of the window without 
actually moving the window. Similarly procedure WRITE is available 
to update the large array from the local array without moving the 
window. 

Depending upon the size of the window and the position of the 
window within the large array, the window may partially or fully cover 
one or more blocks of the array (see Figure 21. Internally lor each 
window a list of blocks covered by the window is maintained as 
block_2.isC. The procedure READ makes an entry call to each of the 
tasks associated with the blocks in this list sequentially, to update 
the appropriate portions of the window. Similarly, WRITE makes an 
entry call to the tasks for updating the appropriate portions of the 
blocks. 

Uhen a window is to be moved to a new position, several steps are 
performed by the procedure SET. First the elements at the present 
position are updated (for update windows only) by calling the WHITE 
procedure. The procedure RELEASR^BLOCKS is then called to detach the 
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window from the blocks it was covering. RELEAS£_ BLOCKS makes entry 
calls to the. Casks controlling Che blocks covered by the window Co 
detach the window from the blocks. Note procedures VRITE and 
RELEASE _BL0CKS are called only if the block_list is nonempty. An 

empty block_list indicates that either the window has not yet been 
initially stationed on the array or that its position is such that it 
is totally outside the array and hence is not covering any of the 
blocks of the array. 

Next the blocks that would be covered by the window in the new 
position are determined. SET calls the procedure OBTAIN_BLOCKS for 
this purpose. -OBTAIN_BLOCKS determines the position of the window 
with respect to the large array and the blocks covered by the window at 
this position. Entry calls are made to the tasks controlling these 
blocks to "attach" the window to the blocks. If another incompatible 
window covers the same elements, the “attach" call is delayed until the 
other window is moved. Once attach entry calls for all the blocks 
covered by the window have been successfully completed, the elements at 
the new position are read into the window via the procedure READ (for 
read“only and read~write windows onlyj, thus completing the operation 
of moving a window to a new position. The procedures RELEASE_BLOCKS 
and OBTAIN_DLOCKS are not visible to the programs using the package. 

5.1. i. Subwindows 

Subwindows can be overlayed on a window via the procedure 
CHEATE_SUBU1ND0U provided for this purpose. The window with which 
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the subwindoM is to be associated, its size, and its position within 
the window are passed as parameters. The following statements 


A_sw : array of SUBHINDOU; 

X_sw : SUBUINUUW; 

for i in l..n 
loop 

CREA TS_SUBWINDOUfA_SM(i ) , 

rot>r_size -> ri, col_size => 1, 
rou_pos => 1, col_pos => i, 
uind => A_window ); 

end loop; 

CREATE _SUBWIND0U(X_su, 

row_size => n/2, col_size => 1, 
rou_pos => 1, col_pos => 1, 
uind => X_uindou }; 

declare a subwindow X_sw and an array of subwindows A_su. Each of 
the A_su subwindows is created to be a n by 1 subwindow associated 
with the window A_uindou . The i’th subwindow A_swCi) is overlayed 
on the i’th column of the A_uindou . The X_su subwindow is created 
so as to cover the top n/2 elements of the X_uindou. 


The subwindow should lie entirely within the window with which it 
is associated, otherwise the exception SUBUIND0y_0UTSIDE_UINU0U is 
raised. For each window a list of subwindows overlaying it is 
maintained. On creation the subwindow is attached to the end of the 

subuindow_list for the associated window. 


Procedure ASSIGN can be used to assign a value to a particular 
element of the subwindow. Function GET returns the value of the 
specified element. For example, here 
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ASSIGNCX.sw, 4, 1, GET(A_su(5). 1, 1) 

results in the first element of the bth A_sw subwindow being assigned 
to the 4th element of X_su. These two subprograms work directly on 
the window to which the subwindow is attached since a subwindow does 
not have its own private copy of the elements. 

Note that there are no procedures for moving subwindows. Only the 
window on which the subwindow is overlayed can be moved. Thus when a 
window is moved, conceptually all the subwindows attached to it are 
automatically moved while preserving their relative positions. 

S.l.i. End oi Structure Functions 

The functions EOS can be used to determine if a window or 
subwindow has reached the end of the structure, i.e., the edge of the 
large array on which the window is stationed. A window can be totally 
inside the large array or totally outside it. The window can also be 
straddling the large array along one of the edges or one of the 
corners. The functions EOS returns a value of the enumeration type 
DIRECTIONS. A value INSIDE is returned if the window or subwindow 
is totally inside the large array while OUTSIDE is returned if the 
window or subwindow is totally outside the large array. The compass 
directions are returned for the other eight possible positions of the 
window or subwindow with respect to the large array. Thus N is 
returned when the top edge is straddled, NU is returned if the top 
left hand coiner is straddled and so on. 
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The LARGE_ARRAY_FACKAGE also provides various other functions to 
determine the properties of large arrays, windows and subwindows. The 
package has been described in greater detail in [10]. 

5 . 2 - Mutual Exclusion and Deadlock 

The semantics of the Ada tasking facilities have been used to 
provide mutual exclusion to tasks wishing to access the same block of 
the large array simultaneously. Since an independent 
BL0CK_C0NTR0LLER task controls each of the blocks, the 
BL0CK_C0NTR0LLER task performs the role of a "monitor" for the block. 
As long as two tasks are moving their windows oyer different parts of 
the large array, the movement can be performed in parallel. It is only 
when the two windows have to move onto the same block that their 
movements are carried out in a sequential manner. Also the reading and 
writing can be performed by only one task at a time. This does not 
inhibit parallel activity, since once a copy of the relevant portion of 
the block has been made into a window, independent tasks can process 
their windows concurrently. 

Windows can be moved asynchronously over a large array by the 
tasks owning the windows. In moving the windows, the tasks are 
requesting "resources", i.e., making attach entry calls to the tasks 
controlling the blocks of the large array in ein independent and 
asynchronous manner. A task can be blocked if the movement of the 
window would cause an incompatible overlap with another window already 
stationed on the block. Thus a deadlock could potentially occur if two 
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windows each simultaneously request to be attached to blocks on which 
the other is already stationed in such a fashion as to cause an 
incompatible overlap in both blocks. Such a deadlock is avoided by a 
simple resource ordering strategy. The blocks of a large array are 
numbered sequentially and must be requested by each task in a fixed 
sequence. A SET operation (and hence the MOVE operation) proceeds 
by first releasing the blocks covered by the window in the old position 
and then attaching the window to the blocks in its new position in the 
order of the block numbers. No reading or writing of the window 
elements at the new position may occur until the window has been 
successfully attached to all the blocks that it has to cover. This 
ensures that two windows cannot each be vying for a block already 
covered by the other window, thus avoiding deadlock. 

5-3. Block sizg 

Each large array is internally divided into blocks. The size ol 
the blocks is dependent on several competing factors. Since the block 
is used as a unit for transfer of data between secondary storage and 
primary memoryCs), the optimum size for I/O transfer influences the 
block size. Each block is controlled by an independent task. Thus if 
the block size is small, the number of tasks controlling the blocks 
will become large. On the other hand, if the block size is large then 
the concurrency available during the movement of windows is reduced. 
The trade off between the amount of parallelism and the number of tasks 
has to be weighed in conjunction with the optimum I/O transfer size 
when determining the size of the blocks of the large arrays. 
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ti. Array Processing ; An Example 

An algorithm for solving a triangular system of equations is 
elaborated in this section. The equations are represented by the 
following matrix equation; 

Ax = b 

where the matrix A is an n by n upper triangular matrix. The 
algorithm, coded in Ada (see Appendix B3, essentially uses a direct 
back solve method for solving the equations for a set of m right hand 
side vectors b. 

The algorithm utilizes h subtasks for solving the system of 
equations ~ n~l BACK_SOLVE subtasks and one HAIN_BACK_SOLVE . The n 
subtasks are conceptually arranged in a linear chain, each having a 
right and left neighbor (except of course the subtasks at the end 
points). The subtask MAIN_BACK_SOLVE is the n’th subtask in the 
chain. 

The algorithm is set up so that each subtask views one column of 
the array A and calculates one element of the solution vector x. The 
subtask HA1N_BACK_S0LVE ’’receives" the right hand side vector b and 
calculates the n’th element of x using the n’th element of b and the 
A(n,n) element. It also calculates the contribution of the n’th column 
of the array A to the solution vector and passes on the partially 
calculated x vector to its left neighbor. The id’th subtask BACK_SOLVE 
receives the partially calculated x vector from its right neighbor, 
calculates the id’th element of the x vector and passes on the 
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partially calculated values of x, updated by contribution of the id’th 
column of the array A, to its left neighbor. 

The procedure MAIN declares A , X and B as LARGE_ARRAYs 
and opens A and B, associating them with already existing external 
files A_file and B_file respectively. It also creates a new file 
X_tile , to be associated with the LARGE_AHRAY X. A read-only 
window A_windou , is also defined by MAIN and associated with the 
LARGE_ARRAY A - The window A_uindou is as large as the array 

itself and is positioned so as to cover the whole array. 

Each of the BACK_SOLVE subtasks declares a subwindow A_su on 
the window A_uindow and overlays it such that the subwindow for 

subtask id is at the id’th column of the window (see Figure 3(a)]. 
Also a one element write-only window is declared by each subtask on the 
LARGS_ARRAY X and initially positioned on the first column of the 
id’th row. 

The subtask MAIN_BACK^SOLVE , in addition to the A_sft subwindow 
and the X_uindow also has a window, B_uindou, on the LABGE_AHHAY 
B (Figure 3(c)). This window is initially positioned at the first 
column of the array B. After setting the B_uindou on the first 
column of the B array, the subtask calculates the n’th element of the 
X vector and writes it in the X_uindow. It then passes the partially 
calculated n-1 values to its left neighbor, subtask n-1, and then moves 
the B_windou and X,windou one position to the right on the 
respective arrays so as to process the next b vector (Figure 3(b) 8t 
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3(c)). Note that the move increments for both the windows are 
appropriately set when defining the windows. Processing is continued 
until the end of structure is reached ( EOS(B_UINDOW) = OUTSIDE) . 

The id’th subtask BACK_SOLVE receives id elements of the 

partially calculated x values from its right neighbor and generates the 
id’th element of the x vector in the X_window. It communicates 
partially calculated x values to its left neighbor and then moves its 
X_uindoM (Figure 3(b)). This is repeated until all the right hemd 
sides have been processed. For each right hand side vector, the id'th 
subtask produces the id'th element of the corresponding x vector. Thus 
the end of processing is signified by the fact that the X_wiadou has 
been moved outside the X array ( EOS(X_uindou) = OUTSIDE) . Uhen all 
subtasks have finished processing the main program closes all the 
large_arrays. 

As noted before, the above algorithm has been described without 
assuming an underlying architecture. An implementation of the 

algorithm on the NASA Finite Element Machine is described in [9]. 

7. Conclusion 

The minimal support for large data structures in proposed 
languages for multiprocessor architectures has led to the present 
investigation. The basic aim of the conceptual model presented in this 
paper, is to lay the groundwork for incorporating the concept of large 
arrays into parallel languages. The model provides the user with an 
easy means of specifying not only the distributed storage but also the 
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distributed processing of large arrays in a high level language without 
resorting to low level coding within his program. 

The basic data structure "large array" is the same as that of an 
ordinary small array, but it is clear that the manner in which a large 
array is to be processed in a multiprocessor environment cannot be the 
same because of the performance realities involved in its processing. 
A similar dichotomy caui be seen in sequential languages where the two 
concepts of files amd vectors are not merged even though files can be 
regarded as vectors of records. 

The conceptual model for parallel processing of large arrays, as 
presented in this paper, is an attempt to fill the gap between what is 
provided by the present day parallel languages and what is needed by 
the users of such languages. The central concepts of the model, 
windows and subwindows, provide the means for representing both the 
distributed storage and concurrent processing of large arrays in 
programs in a way that is natural for array processing. 

The Ada large array package was implemented so as to specify the 
semantics of the above concepts in more concrete terms. Various 
factors dictated the choice of Ada as a vehicle for the specification 
of the semantics. Ada provides excellent facilities for implementing 
abstract data types through generic packages. The Ada tasking 
facilities can be used to represent a virtual abstract multiprocessor 
system. Since the thrust of this research was the study of large 
arrays in a parallel environment rather than the use of concurrent 
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systems per se, Ada tasking facilities prpvided a base on which the 
large array model could be implemented. 

The Ada package provides a particular implementation of the 
general model for parallel processing of large arrays without any 
assumptions about the underlying architecture. However the performance 
realities associated with each particular architecture have to be taken 
into account when considering the implementation of the model on a 
particular machine. The model provides a general framework fdiich can 
be tailored to fit the nuances of a particular architecture. The 
strategies for implementing the model on four architectures namely: the 
NASA MPP [3J, the Intel 432 system (IJ, the NASA Finite Element 
Machine 118], and the University of Maryland ZMOB U5] have been 
described in [101. 
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APPENDIX A 

Ada Generic LARGE_ARMY Package 


The following generic package, LARGE_ARRAY_PKGE, 
can be used for parallel processing of large arrays. — 
The package defines the needed data types and 
subprograms for such processing. 

The package can be instantiated with any actual type — 
corresponding to the formal parameter ELEMENT. 


generi c 

type ELEMENT is private; 
package LAHGE_APRAY_PKGE is 
type LARGE_ARRAY is private; 

type MATRIX is arrayC INTEGER range <>, INTEGER range <>) 

of ELEMENT; 

type MATRIX_ACCESS is access MATRIX; 

type hJINDOU_INFO is private; 

type WINDOU_DESC is 
record 

win : MATRIX ACCESS; 

info : UIND0W_1NF0; 

end record; 

type UINDOU is access ilINDOU_DF.SC ; 
type SUBUINDOU is private; 
type PRIVILEGES is (R,W,RU); 

type DIRECTIONS is (N,S,E,U,NE.mi,SE,SU, INSIDE, OUTSIDE); 
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Procedures to create or open large arrays 

procedure CREATE_LAHGE_ARHAY( ar : in out LARffE_AHI{Ay; 

row_low_bd, 

roM_bigh_bd, 

col_low_bd, 

col_hisrh_bd : in INTEGER; 
name s in STRING); 

procedure OP£N_LARGE_ARRAY ( ar : in out LARGE_ARRAY; 

row_lou_bd, 

row_high_bd. 

col_low_bd, 

colJiigh_bd .• in INTEGER; 
name : in STRING); 


Procedures to delete or close large arrays 
procedure DELKTE_1ARGE_ARRAY( ar; in out LARGE_ARRAY); 
procedure CLOSE_LARGE_AHRAY ( ar: in out U[RUE_ARRAY) ; 


Procedures to create uindous and subwindows 


procedure CREATE_UINDOW( wind 

row_size, col_size 
inmode 

row_inc, col_lnc 

ar 

edge 

edge_element 


in out UINDOU; 
in NATURAL; 
in PRIVILEGES; 
in INTEGER; 
in LARGE ARRAY; 
in BOOLEAN; 
in ELEMENT); 


procedure CREATR_SUBUINDOU( subwin , in out 


row_size. 

col_size. 


row_pos, 

col_pos ; 

in 

inmode 


in 

wind 


in 


SUBUINDOU; 

NATURAL; 

PRIVIIEGSS; 

VINDOU); 
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- Procedures to move windous. 

~ Note that a move implies 

writing the last position (il the window is not read-only! 
reading the new position (il the window is not write-only). 

procedure NOVR( wind : in out UINDOW); -- relative movement 

procedure SKT ( wind : in out WINDOW; 

new_row, 

new_col : in INTEGEft ); — absolute movement 

- Procedures to read and write windows without moving them. 

procedure HEAD ( wind: in out WINDOW); 
procedure WRITEC wind: in WINDOW); 


- Subprograms to assign and retrieve values of specified 


— elements of subwindows. 

procedure ASSIGN (subwin 

row, col 
value 

function GET ( subwin 
row, col 


: in out SUBWINDOW; 

: in NATURAL; 

: in ELEMENT); 

: in SUBWINDOW; 

: in NATURAL) return ELEMENT; 


— Functions 


to determine the end of structure. 


function EOS(wind: in WINDOW) return DIRECTION!; 
function EOSCwind: in SUBWINDOW) return DIRECTIONS; 


- Various functions to determine properties of large arrays, 
windows and subwindows. 


— Exceptions that can be raised in the package. 


ARRAY_SIZE_ERROR 
ARRAY_D1FFERENT_ERRQR 
READ WR1TE_M0DE_ERR0R 
NONF~XISTENT_ARRAY_ERROR 
NONEXISTENT WINDOW ERROR 
SUBWINDOW_OUTSIDE WINDOW 


exception; 

exception; 

exception; 

exception; 

exception; 

exception; 


end LARGE_ARRAY_PKGE; 
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APPENDIX B 

Ada Program for the Back Solve Process 

with LABGE_ARBAY_PKGE; 
procedure /IAIN is 

package ELOAT_ABBAY is new LARGE ARHAY_PKGE( FLOAT); 
use FLOAT_AJmY; 

n ; constant NATURAL ; ~~ Size of the A matrix 

m : constant NATURAL ; ~~ Number of right hand sides 

A.X.B : LARGR_ARRAY; 

A_window : UINUOU; 

begin 

OPEN_lsARGE_ARRAY( A, row_low_bd -> 1, row_high bd n, 

col_low_bd => 1, col_high bd => n, 

name => "A_fileV: 

OPEN_LARGE_ARRAY( B, row_low_bd => 1, row_high bd -> n, 

col_lou_bd => 1, col_high_bd => m, 

name => "B_file‘‘); 

CRhAI E_LARGE_ARRAY( X, row_low_bd -> 1, row_high bd => n, 

col_low_bd -> 1, col_high bd => m, 
name => "X_file"); 

CREATE_CJINDOV( A_window, row_size => n, col^size => n, 

inmode => R, 

row_inc => 0, col_inc => 0, 

ar => A, 

edge -> FALSE, edge_element -> 0.0); 

SET(A_window, new_row => 1, new_col ~> 1); 

declare 

type VECTOR is array! l..n) of FLOAT; 

task type BACK_SOLVE is 

entry UHO_AM_l(self id : NATURAL); 
entry NEXT( x : VECTOR); 
end DACK_SOLVE; 

task /IAIN_BACK_SOLVE is 
end MAIN_BACK_SOLVF; 


sol ve 


: arrayfl..n-lj of BACK_SOLVE; 
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ta^-k body MAIN_BACK_SOLVE is 

B_mndou, X_uindow : UINDOU; 

A_sw : SUBaiNDOU; 

partial_x .• VECTOB; 

begin 

CREATE_SUBUINDOU(A_su. rou_size => n, col_size *> 1. 

row_pos => 1, col_pos => n, 

wind => A_window ); 

CREATE_UINDOU(X_window, row_size => 1, col_size => 1, 

inmode => U, 

row_inc => 0, col_inc => 1, 

ar -> X, 

edge => FALSE, edge_element => 0.0) 

SET(X_windou, neu_row => n, new_col => 1); 

CREATE_UlNDOll(B_window, row_size => n, col_size -> 1, 

inmode => R, 

row_inc => 0, col_inc => 1, 

ar => B, 

edge => FALSE, edge_element => 0.0) 

SET(B_uindou, new_row =•> 1, new_col => 1); 

solve_cycle: 

loop 

X_window.win(l,l) := B_uindow.winCn, 1) / GET(A_sw,n,l); 

for i in l..n~l 
loop 

partial_x(i) (B_window.win(i,l) 

- GETlA_sw,i,l) » X_window.win(l,ljJ; 

end loop: 

sol veC n~l ),NEXT( partial _x) ; 

tfOVECB^window) ; 
tIOVE( X_window) ; 

exit solve_cycle when (E0S(B_window) = OUTSIDE)} 
end loop solve_cycle; 


end HAJN_BACK_SOLVE; 
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task body BACK_SOLVE is 

X uiridou : UINDOCJ; 

A_sw .* SUBUINDOU; 

id : MTURAL; 

partial_x : VECTOR; 

begin 

accept UH0_AM_1 ( self_id : NATURAL) do 
id := self_id; 
end UHO_AM_I; 

CREATE_SUBUlNDOU(A_su. row_size -> n, col_size => 1, 

row_pos 1, col_pos => id, 
wind =•> A_window ); 

CREATE_aiNDOWCX_window. row_size => 1, col_size => 1, 

inmode => V, 

row_inc -> 0, col_inc => 1, 

ar => X, 

edge => FAISIE, edge_element => 0.0); 

SET(X_window, new_row => id, new_col => 1); 

solve_cycle: 

loop 

accept NEXT(x : VECTOR) do 
for i in 1. .id 
loop 

partial_x(i) ;= x(i); 
end loop; 
end NEXT; 

X_window.win(l,l) partial_x(id) / GET(A_sw,id,l); 
for i in l..id-l 
loop 

partial_xCi) (partial_x(i) 

-GET(A_sw,i,l) • X_window.win(l,D); 

end loop; 

if id /- 1 then 

solvef id-1) .NEXT (part ial_x) ; 
end if; 

mVE(X_window) ; 

exit solre_cycle when (EOS(X^window) - OUTSIDE); 

end loop solve_cycle; 
end BACK_SOLVK; 
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begin — declare 

lor i in l..n-l 
loop 

sol ve(i). WHO_Att_l(i ) , 
end loop; 

end; — declare 

CLOSE LARGE_ARRAY(X); 
CLOSE_LAHGE_ARRAY(Bh 
CLOSE_LARGE_AHRAY(A); 


end MAIN; 
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